个性化移动代理中的感知系统需要开发室内场景理解模型,这些模型可以理解3D几何,捕获客观性,分析人类行为等。但是,与户外环境的模型相比,该方向并未得到充分探索(例如自动驾驶系统,包括行人预测,汽车检测,交通标志识别等)。在本文中,我们首先讨论主要挑战:不足,甚至没有标记为现实世界室内环境的数据,以及其他挑战,例如异质信息来源(例如RGB图像和LIDAR点云)之间的融合,建模关系建模关系在各种输出集(例如3D对象位置,深度估计和人类姿势)和计算效率之间。然后,我们描述MMISM(多模式输入多任务输出室内场景理解模型)来应对上述挑战。 MMISM认为RGB图像以及稀疏的LIDAR点是输入和3D对象检测,深度完成,人体姿势估计和语义分割作为输出任务。我们表明,MMISM在PAR上执行甚至比单任务模型更好。例如,我们在基准Arkitscenes数据集上将基线3D对象检测结果提高了11.7%。
translated by 谷歌翻译
我们介绍了Gaudi,Gaudi是一种生成模型,能够捕获可以从移动的相机中沉浸式的复杂和现实3D场景的分布。我们通过一种可扩展而强大的方法解决了这个具有挑战性的问题,我们首先优化了散布辐射场和相机姿势的潜在表示。然后,该潜在表示将学习一个生成模型,该模型可以使3D场景的无条件生成和条件生成。我们的模型概括了以前的作品,该作品通过删除可以在样本中共享相机姿势分布的假设来关注单个对象。我们表明,高迪(Gaudi)在多个数据集的无条件生成设置中获得了最先进的性能,并允许有条件地生成3D场景给定的调理变量,例如稀疏图像观测值或描述场景的文本。
translated by 谷歌翻译
由于具有强大的代表性,变形金刚在包括自然语言处理(NLP),计算机视觉和语音识别在内的广泛应用中越来越受欢迎。但是,利用这种代表性的能力有效地需要大量的数据,强大的正则化或两者兼而有之以减轻过度拟合。最近,基于掩盖的自动编码器的自我监督预处理策略已解锁了变压器的功能,这些策略依赖于直接或从未掩盖的内容对比的掩蔽输入进行重建。这种预训练的策略已在NLP中的BERT模型,Speak2VEC模型中使用,最近在Vision中的MAE模型中,该模型迫使该模型使用自动编码相关的目标来了解输入不同部分中的内容之间的关系。在本文中,我们提出了一种小说但令人惊讶的简单替代内容,以预测内容的位置,而无需为其提供位置信息。这样做需要变压器仅凭内容就可以理解输入不同部分之间的位置关系。这相当于有效的实现,其中借口任务是每个输入令牌所有可能位置之间的分类问题。我们在视觉和语音基准上进行了实验,我们的方法对强有力的监督训练基准进行了改进,并且与现代的无监督/自我监督预审方法相媲美。我们的方法还可以使经过训练的变压器在没有位置嵌入的情况下胜过训练有完整位置信息的训练的变压器。
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
Pre-trained language models (LMs) have shown remarkable reasoning performance using explanations (or ``chain-of-thought'' (CoT)) for in-context learning. On the other hand, these reasoning tasks are usually presumed to be more approachable for symbolic programming. To make progress towards understanding in-context learning, we curate synthetic datasets containing equivalent (natural, symbolic) data pairs, where symbolic examples contain first-order logic rules and predicates from knowledge bases (KBs). Then we revisit neuro-symbolic approaches and use Language Models as Logic Programmer (LMLP) that learns from demonstrations containing logic rules and corresponding examples to iteratively reason over KBs, recovering Prolog's backward chaining algorithm. Comprehensive experiments are included to systematically compare LMLP with CoT in deductive reasoning settings, showing that LMLP enjoys more than 25% higher accuracy than CoT on length generalization benchmarks even with fewer parameters.
translated by 谷歌翻译
Federated Deep Learning frameworks can be used strategically to monitor Land Use locally and infer environmental impacts globally. Distributed data from across the world would be needed to build a global model for Land Use classification. The need for a Federated approach in this application domain would be to avoid transfer of data from distributed locations and save network bandwidth to reduce communication cost. We use a Federated UNet model for Semantic Segmentation of satellite and street view images. The novelty of the proposed architecture is the integration of Knowledge Distillation to reduce communication cost and response time. The accuracy obtained was above 95% and we also brought in a significant model compression to over 17 times and 62 times for street View and satellite images respectively. Our proposed framework has the potential to be a game-changer in real-time tracking of climate change across the planet.
translated by 谷歌翻译
Focusing on the complicated pathological features, such as blurred boundaries, severe scale differences between symptoms, background noise interference, etc., in the task of retinal edema lesions joint segmentation from OCT images and enabling the segmentation results more reliable. In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network, which can provide accurate segmentation results with reliability assessment. Specifically, aiming at improving the model's ability to learn the complex pathological features of retinal edema lesions in OCT images, we develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module of our newly designed. Meanwhile, to make the segmentation results more reliable, a novel uncertainty segmentation head based on the subjective logical evidential theory is introduced to generate the final segmentation results with a corresponding overall uncertainty evaluation score map. We conduct comprehensive experiments on the public database of AI-Challenge 2018 for retinal edema lesions segmentation, and the results show that our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches. The code will be released on: https://github.com/LooKing9218/ReliableRESeg.
translated by 谷歌翻译
Accurately predicting interactive road agents' future trajectories and planning a socially compliant and human-like trajectory accordingly are important for autonomous vehicles. In this paper, we propose a planning-centric prediction neural network, which takes surrounding agents' historical states and map context information as input, and outputs the joint multi-modal prediction trajectories for surrounding agents, as well as a sequence of control commands for the ego vehicle by imitation learning. An agent-agent interaction module along the time axis is proposed in our network architecture to better comprehend the relationship among all the other intelligent agents on the road. To incorporate the map's topological information, a Dynamic Graph Convolutional Neural Network (DGCNN) is employed to process the road network topology. Besides, the whole architecture can serve as a backbone for the Differentiable Integrated motion Prediction with Planning (DIPP) method by providing accurate prediction results and initial planning commands. Experiments are conducted on real-world datasets to demonstrate the improvements made by our proposed method in both planning and prediction accuracy compared to the previous state-of-the-art methods.
translated by 谷歌翻译
Spiking neural networks (SNN) are a viable alternative to conventional artificial neural networks when energy efficiency and computational complexity are of importance. A major advantage of SNNs is their binary information transfer through spike trains. The training of SNN has, however, been a challenge, since neuron models are non-differentiable and traditional gradient-based backpropagation algorithms cannot be applied directly. Furthermore, spike-timing-dependent plasticity (STDP), albeit being a spike-based learning rule, updates weights locally and does not optimize for the output error of the network. We present desire backpropagation, a method to derive the desired spike activity of neurons from the output error. The loss function can then be evaluated locally for every neuron. Incorporating the desire values into the STDP weight update leads to global error minimization and increasing classification accuracy. At the same time, the neuron dynamics and computational efficiency of STDP are maintained, making it a spike-based supervised learning rule. We trained three-layer networks to classify MNIST and Fashion-MNIST images and reached an accuracy of 98.41% and 87.56%, respectively. Furthermore, we show that desire backpropagation is computationally less complex than backpropagation in traditional neural networks.
translated by 谷歌翻译
行星漫游者任务必须利用基于机器学习的感知来继续发生地球外探索,几乎没有人类的存在。火星地形细分对于漫游车导航和避免危害至关重要,以执行进一步的探索性任务,例如土壤样品收集和寻找有机化合物。当前的火星地形细分模型需要大量标记的数据才能实现可接受的性能,还需要重新培训以在不同域中的部署,即不同的漫游者任务或不同的任务,即地质识别和导航。这项研究提出了一种半监督的学习方法,该方法利用了骨干的无监督对比度预处理,用于对火星表面的多效率语义分割。该模型将通过使用混合域训练套件来确保具有多样性的混合域训练套件,从而扩展到当前的火星分割能力,以便在不同的火星漫游者任务中部署以进行地形导航。使用平均像素精度的评估结果表明,与单个领域训练和监督培训相比,半监督的混合域方法通过达到火星科学实验室的好奇心漫游者的精度为97%,MARS 2020 Perseverance Perseverance Rover提高了精度。 。此外,使用召回度量与标准的跨透镜损失相比,使用召回度量的损失功能提供不同的权重方法将对少数族裔或稀有类别的模型提高了30%以上。这些结果可以以数据效率的方式为Rover任务提供未来的多任务和多任务语义细分。
translated by 谷歌翻译